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start 



Set up an audio clip/fingerprint 
database 



202 



Receive an unlabeled audio 
clip from a user 



204 



Process the unlabeled audio 
clip with the audio fingerprint 
generator to extract an audio 
fingerprint 



206 



Compare the extracted audio 
fingerprint with the audio 

fingerprints stored in the audio 
clip/fingerprint database 



08 




Indicate to the user that the 
unlabeled audio clip cannot be 
identified 



,212 



Use the stored audio 
fingerprint to determine the 
label to the audio clip 






Use the label to retrieve 
catalogue information about 
the audio clip and report it to 
the user 




f 



FIG. 2 
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,216 



Populate the audio clip/ 
fingerprint database with 
audio clips 



302 



For an audio clip in the audio clip/ 
fingerprint database, process the 

audio clip with the audio 
fingerprint generator to extract an 
audio fingerprint. Store the audio 
fingerprint in the database. 



Use the audio fingerprint to 
label the audio clip. Store the 
label in the database. 



,306 



Link the label to catalogue 
information (metadata) about 
the audio clip 



■08 



,310 



y es ^1s there another audio clip" 
to be processed in the 
database? 



no 



FIG. 3 



Flowchart to generate 
an audio fingerprint 
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The audio signal is down- 
sampled into a mono audio 
stream for processing. 
(44.1 /48 kHz --> 5 kHz (mono)) 



404 



t 

Process the down-sampled audio signal by generating 
frequency domain coefficients by first segmenting the 

signal into frames and then doing inverse discrete cosine 
transfqrqri to capture important properties of the signal. 
(s(i) = 2>cos[ n /64(2i+1)(k-16)]y(k), k=0..63, i=0..31, 
where 64 y(k) samples are derived from 32 input audio 

samples after some windowing, shift and add operations) 



Perform feature extraction of the audio 
samples to further analyze the data for a 
more compact data representation. 
(V(n,i) = Variance (s(i), s(0)), 
where V(n, i) denotes energy variance for 
band i of frame n) 



Pack the compact data representation 

into a sub-fingerprint form factor 
(F(n,i) <- 1 , if V(n,i) is less than V(n, 

i+1). v(n-1, 1)^(11-1,1+1). 
else F(n,i) <-- 0, where F(n,i) denotes 
i-th bit of the sub-fingerprint of frame 
n) 
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end 



FIG. 4 
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